Informedia: News-on-Demand Multimedia Information Acquisition
نویسندگان
چکیده
In theory, speech recognition technology can make any spoken words in video or audio media subject to text indexing, search and retrieval. This article describes the News-on-Demand application created within the InformediaTM Digital Video Library project and discusses how speech recognition is used for transcript creation from video, time alignment of closed-captioned transcripts, a speech query interface, and audio paragraph segmentation. Our results show that speech recognition accuracy varies dramatically depending on the quality and type of data used, but the system is quite useable with only moderate speech recognition accuracy. 1. What is Informedia: News-on-Demand The InformediaTM digital video library project [Informedia95, Wactlar96] at Carnegie Mellon University is creating a digital library in which text, image, video and audio data are available for full content retrieval. News-on-Demand is an application within Informedia which monitors news from TV, radio and text sources and allows the user to retrieve news stories of interest. This paper gives a brief overview of the Informedia digital video library project [Christel94a, Stevens94, Christel94b, Informedia95] followed by a detailed description of the News-on-Demand application [Hauptmann95]. Both the automated library creation process for News-on-Demand and the news library exploration process will be explained. We show how speech recognition fits into the various digital news library processing steps. Results are presented for speech recognition on actual broadcast news data. Finally we discuss some active areas of research relevant to the multimedia information acquisition and retrieval problem. 1.1 An Overview of the Informedia Digital Video Library Project Vast digital libraries of information will soon become available on the World Wide Web as a result of emerging multimedia computing technologies. However, it is not enough simply to store and play back information as many commercial video-on-demand services apparently intend to do. New technology is needed to organize and search these vast data collections, retrieve the most relevant selections, and permit the to be effectively reused. Through the integration of technologies from the fields of natural language understanding, image processing, speech recognition and video compression, the Informedia project [Christel-94a] allows a user to explore multimedia data in depth as well as in breadth. The Informedia digital video library project goes far beyond the current paradigm of video-on-demand, where a user can Intelligent Multimedia Information Retrieval, Mark T. Maybury, Ed.. AAAI Press, pps. 213-239, 1997. 2 Informedia: News-on-Demand — Multimedia Information Acquisition and Retrieval select one video from a limited set and view that video after a delay of a perhaps a few minutes. The computer adds no substantial benefit to this video-on-demand model over a VCR with each video on a tape; the user remains a passive observer of someone else’s produced material. By contrast, the Informedia Project segments hours of video into logical pieces and indexes these pieces according to their raw content (dialog, images, narration). The users can actively explore the information by finding sections of content relevant to their search, rather than by following someone else’s path through the material (as one does when using the current generation of educational CD-ROMs) or by viewing a large chunk of pre-produced material (as with video on demand). Through the active, dynamic exploration supported by a deep, rich library and the indexing and retrieval capabilities of the computer, the user is more motivated and may learn more from the data set. Using such a library, a large body of video material can be searched with very little effort. Users are able to explore Informedia libraries through an interface that allows them to search using typed or spoken natural language queries, to select relevant documents retrieved from the library and to play or display the material on their PC workstations. The library retrieval system can effectively process natural spoken queries and deliver relevant video data in small video paragraphs, based on information associated with the video during library creation. Video and other data may be explored in depth for related content. During retrieval based on keyword searches by a user, only the query-relevant video segments are displayed. The Informedia project is developing new technologies and embedding them in a video library system primarily for use in education and training. The Informedia project will establish an on-line digital video library consisting of over 1000 hours of video material. In order to be able to process this volume of data, practical, effective and efficient tools are essential. In the United States, schools and industry together spend between $400 and $600 billion per year on education and training, an activity that is 93% labor-intensive, with little change in teacher productivity ratios since the 1800s. The new digital video library technology will bring about a revolutionary improvement in the way education and training are delivered and received. The initial Informedia test-bed system has been installed in a K-12 school, where students use the Informedia System to explore multimedia data for educational purposes. We plan to extend this test-bed to other Pittsburgh schools. During library creation for the test-bed, video material obtained from our Informedia Project Partners such as WQED/Pittsburgh and the British Open University is used. Our project plan calls for four test-bed installations with users ranging from grade school children to university faculty. In addition, we will provide networked access to the primary test bed, and export portions of the system and data to other sites for their local exploration and experimentation. The user tests will be conducted at Carnegie Mellon University, the Winchester Thurston School in Pittsburgh, the Fairfax County (VA.) public school system, and with the Open University in the UK. Users will be of many different types, as we test the practicality of the concept of multimedia library search and the usability of the user interface for various age and interest groups. Universal access to large amounts of low-cost digital information and entertainment will significantly affect the conduct of business, professional, and personal activity. The initial impact of the Informedia project’s activity will be by enabling broad accessibility and reuse of existing Hauptmann and Witbrock 3 video materials (e.g., documentaries, news, vocational, training) previously generated for public broadcast, public and professional education, and vocational, military and business training. 1.2 The Informedia: News-on-Demand Application One compelling application branch of the Informedia project is the indexing and retrieval of television, radio and text news. The Informedia: News-on-Demand application [Hauptmann95] is an innovative example of indexing and searching broadcast news video and news radio material by its text content. News-on-Demand is a fully-automatic system that monitors TV, radio and text news and allows selective retrieval of news stories based on spoken queries. The user may choose among the retrieved stories and play back the news stories of interest. The system runs on a Pentium PC using MPEG-I video compression. Speech recognition is currently done on a separate platform using the Sphinx-II continuous speech recognition system [CMU-Speech95]. The News-on-Demand application forces us to consider the limits of what can be done automatically and in limited time. Since news events happen daily, it is not feasible to process, segment and label news through manual or “human-assisted” methods. Immediate availability of the library information is important, as is continuous updating of the contents. 4 Informedia: News-on-Demand — Multimedia Information Acquisition and Retrieval Digital Compression Text Library Creation
منابع مشابه
Lessons for the Future from a Decade of Informedia Video Analysis Research
The overarching goal of the Informedia Digital Video Library project has been to achieve machine understanding of video media, including all aspects of search, retrieval, visualization and summarization in both contemporaneous and archival content collections. The base technology developed by the Informedia project combines speech, image and natural language understanding to automatically trans...
متن کاملNew Directions in Video Information Extraction and Summarization
The Informedia Digital Video Library project provided a technological foundation for full content indexing and retrieval of video and audio media. New directions for this research extend to: (1) search and retrieval in multilingual video corpora, (2) analysis and indexing of continuously captured, unstructed and unedited fieldcollected video, and (3) summarization of video-based content across ...
متن کاملTopic Labeling of Multilingual Broadcast News in the Informedi
The Informedia Digital Video Library Project includes a multilingual component for retrieval of video documents in multiple languages and a topic-labeling component for English video documents. We now extend this capability to English topic labeling of foreign-language broadcast-news stories. News stories are coarsely machine-translated into English, then assigned to a topic category using a K-...
متن کاملStory Segmentation and Detection of Commercials in Broadcast News Video
The Informedia Digital Library Project [Wactlar96] allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract sufficiently accurate speech recognition transcripts from the broadcast audio and that we can seg...
متن کاملNew Directions in Video Information Extraction and Summarizati
The Informedia Digital Video Library project provided a technological foundation for full content indexing and retrieval of video and audio media. New directions for this research extend to: (1) search and retrieval in multilingual video corpora, (2) analysis and indexing of continuously captured, unstructed and unedited fieldcollected video, and (3) summarization of video-based content across ...
متن کامل